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ABSTRACT 



A multi-tiered system for responding to natural language 
queries is using disclosed. The query resolution system 
returns zero or more links to content that is relevant to the 
users query. The present invention for query resolution 
combines two or more types of natural language query 
resolution methods, where the knowledge base for each of 
the methods comes from a single knowledge specification. 
The various different natural language query resolution 
methods differ fundamentally in how they match the user 
query to the web site content. The results of the resolution 
methods are ranked and all, some, or none of the results of 
each system may be displayed. 
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Figure 2A 
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Figure 2B 

<answer> = /0.7/ <How_do_Ijgetj)aid_to_surf_the_web> {ActionTagl } | /0.3/ <How_does_eStore_make_money> > 
{ActionTag2}; 

<How_do_I_ J get_paid_to_surf_the_web> - <How_do_I> /0.9/ <get_paid> /0.2/ <to_surfjhe_web>; 

<How_does_eStore_make_money> = How does <eStore> / 0.85/ <get_paid> ; 

<get_paid> = /0.7/ get <paid> | /0.3/ make <money>; 

<money> = /0.2/ cash | /0.6/ money | /0.2/ dollars; 

<paid> - /0.8/ paid | /OA/ reimbursed | /0. 1/ rewarded; 

<to_surf_the_web> « to /0.9/ <surr> <article> /0.8/ <web>; 

<article> = /0.33/ a | /0.33/ the | /0.33/ an; 

<web> = /0.25/ internet | /0.25/ world wide web | /0.25/ web | /0.25/ www; 
<surf> = /0.5/ surf | (OA/ browse | /OA/ view | /0.3/ look at; 
<How_do_l> - /0.6/ how do 1 1 /0.4/ can J; 
<eStore> = eStore | you | your company | you guys; 
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METHOD AND APPARATUS FOR MULTIPLE 

TIERED MATCHING OF NATURAL 
LANGUAGE QUERIES TO POSITIONS IN A 
TEXT CORPUS 

FIELD OF THE INVENTION 

The present invention relates to the field of human- 
computer interaction using natural language. In particular, 
the present invention discloses methods of using a multiple 
tiered approach to matching a human's natural language 
query to elements of a corpus of information, in order 
present the user with relevant sections of information from 
the corpus. 

BACKGROUND OF THE INVENTION 

The Internet evolved during the late 20^ century to a vast 
infrastructure of communication that affords billions of 
people with nearly instant access to millions of World Wide 
Web sites. Large numbers of commercial Webs sites that sell 
products and services have blossomed to take advantage of 
this new communication medium. These Internet-based 
commercial web sites are often referred to as "ecommerce" 
sites. Many large ecommerce sites on the Web receive tens 
of thousands of customer inquiries each day. 

The ecommerce companies receive customer inquiries 
through multiple different transport channels on the Internet 
including email, web forms, and chat or other real-time 
interactions with a human customer service agent. To keep 
their customers happy, these ecommerce sites need efficient 
systems for responding to these voluminous customer 
inquiries. To provide simple automated customer support 
many ecommerce sites provide a search engine to the users. 
A search engine accepts a set of designated keywords and 
uses those keywords to locate information related to the 
keywords. 

To provide a simple customer response system, many 
ecommerce Web sites now offer "free- text interactions" 
across the Internet. Free-text interactions allow a user to 
enter a multiple word query in the form of a question or 
command that is handled automatically by a query resolu- 
tion system running on a computer system. Free-text inter- 
actions often involve referring the user to a certain section 
of the web site or a certain part of the web site containing a 
pre-defined set of "Frequently Asked Questions" and 
answers. The frequently asked questions and answers are 
commonly referred to as an FAQ. Based on the degree of the 
best match for the multiple word query, the user may be 
presented with a set of hyperlinks for destinations on the 
web site. These destinations may include one or more 
sections of the FAQ. Alternatively, the user might be pre- 
sented with a customized web page that contains the relevant 
FAQ entries in it. 

A typical search engine allows a user to enter a small 
number of keywords as search terms that will be searched 
for in a target database such as the ecommerce web site. 
Alternatively, a search engine may be used to handle free- 
text queries, where selected words from the free-text query 
are matched against an indexed form of a corpus under 
examination. To select words in the free-text query for 
matching, a search engine might select all words besides 
those considered as "stop words." Stop words are words that 
occur often in language and do not convey much informa- 
tion in and of themselves. In the English language, stop 
words include common prepositions, articles, and conjunc- 
tions. For example, "on," "above", "in," "and", "the" are 
usually considered stop words. 
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To simplify human and computer interaction at an auto- 
mated web site, many web site designers have decided to 
anthropomorphize the search engine of a web site with a 
computer-based agent. In this manner, the users will interact 

5 with the agent as if the agent had the intelligence or verbal 
communication skills of a human being. In these human- 
computer interactions with the agent, the customer may be 
encouraged by the web site to ask a question in its natural 
form. For example, the site may present the user with a 

10 one-line text box with the prompt: "Please ask a question, 
such as "What does your company do?'." 

Unfortunately, customers may ask questions that the natu- 
ral language query facility misinterprets and thus does not 
provide meaningful results. For example, if a natural lan- 

15 guage query facility strips out the stop words of "What does 
your company do?" and applies the keywords to a search 
engine, the search engine will not likely be able to provide 
meaningful results. One reason this may occur is that the 
question "What does your company do" does not contain a 

20 set of words that conveys the overall meaning of the query 
the stop words have been removed. For example, the stop 
word removal system of one embodiment leaves only the 
word "company". It is unlikely that a search of the web site's 
content for the word "company" will provide meaningful 

25 results to the original question. In fact, it is likely that the 
search engine will return a plethora of irrelevant results. 

Natural language query systems on the World Wide Web 
have proven to be quite popular with the general public. 
However, the current implementations of natural language 

30 query systems often yield inaccurate or limited results. It 
would therefore be desirable to have improved natural 
language query systems that provide improved results. 

SUMMARY OF THE INVENTION 

35 

A system for matching natural language queries to web 
site content is disclosed. The query resolution system returns 
zero or more links to content that is relevant to the users 
query. The present invention for query resolution combines 

40 two or more types of natural language query resolution 
methods, where the knowledge base for each of the methods 
comes from a single knowledge specification. 

The various different natural language query resolution 
methods differ fundamentally in how they match the user 

45 query to the web site content. The results of the resolution 
methods are ranked and all, some, or none of the results of 
each system may be displayed. 

Other objects, features, and advantages of present inven- 
tion will be apparent from the company drawings and from 

50 the following detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The objects, features, and advantages of the present 
invention will be apparent to one skilled in the art, in view 
55 of the following detailed description in which: 

FIG. 1A illustrates a typical computer environment that 
may use the query resolution teachings of the present 
invention, in the context of web-based self-service. 
6Q FIG. IB illustrates a block diagram of one embodiment of 
a Multi-tiered natural language query resolution system 
created using the teachings of the present invention. 
FIG. 2 A illustrates a set of tags and associated links. 
FIG. 2B illustrates a regular grammar that is a proper 
65 subset of the class of context free grammars. 

FIG. 3 illustrates a Bayesian network created from the 
grammar of FIG. 2B. 
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FIG. 4A illustrates a first generated local network struc- Most web servers are not equipped to perform complex 

ture for the Bayesian network of FIG. 3. processing of user information. Thus, the web server 120 

FIG. 4B illustrates a second generated local network mav communicate the natural language query to an appli- 

structure for the Bayesian network of FIG. 3. cati ° n server 130 t0 have * s P ecial , W^ation onthe 

_ _ Mi , . , , , , , 5 application server process the natural language query. The 

FIG. 4C illustrates a third generated local network struc- web SMVef can use any type of standard or proprietary 

ture for the Bayesian network of FIG. 3. protocol to send the natural language query to the applica- 

FIG. 5 A illustrates a third generated local network struc- tion server 130. 

ture for the Bayesian network of FIG. 3. The teachings of the present invention may be imple- 

FIG. 5B illustrates a third generated local network struc- a0 rented within the multi-tiered natural language query pro- 

ture for the Bayesian network of FIG. 3. cessing system 100 illustrated in FIG. IB that runs on the 

FIG. 6 illustrates a flow diagram describing the operation »PPli«tion server 130 of FIG. 1A. Referring to FIG. IB the 

f tr i.fj.ii multi-tiered natural language query processing system 100 

of one embodiment of mult.-uered natural language query ^ ^ & 4 uerie 7 to fj a set of 

resolufon system of Ihe present invention. desired resuhs Iq 0Qe em g odimem> the ^iti-fcred natural 

DETAILED DESCRIPTION OF THE 15 lan S ua £ 6 1 uer y processing system 100 processes the natural 

PREFERRED EMBODIMENT qU f y m ° rder !° lo S? te releva °! ,_ secti ° ns of 

information from a corpus 135. The corpus 135 may com- 

A method and apparatus for implementing a multiple- prise a text file. In such an embodiment, the multi-tiered 
tiered system for resolving natural language queries is natural language query processing system 100 may return a 
disclosed. In the following description, for purposes of 20 set of pointers to sections of text the text corpus 135 under 
explanation, specific nomenclature is set forth to provide a examination. In one embodiment, these pointers may corn- 
thorough understanding of the present invention. However, P rise a set of hyperlinks and bookmarks to sections of text 
it will be apparent to one skilled in the art that these specific from the ^xt corpus 135. A relevant excerpt of the text may 

i f -i 4 • «„j„ ♦ „ t - f . , accompany the returned hyperlinks such that the excerpt of 

details are not required m order to practice the present * * u i j • j- * i L r ^ *i_ i_ i-i 

tU .. . « i j. 25 text may be placed immediately before or after the hyperlink 

invention. Furthermore, the present invention has been dis- " r ■ j - a *l *u • r *u 

, , tU r ' .j. , - , reference m order to provide the user with a preview of the 

closed with reference to specific embodiments. For example, information 

the present invention will be disclosed with reference to an T . #u ■ * * i 

, j. . , . 4 . iL , ... In one embodiment, the corpus 135 consists of general 

embodiment wherein a user interacts with a web site using Qn ^ web sUe and if f call designated "f re q Uently 

natural language. However, many variant embodiments can as £ ed queslions » (FA Q) web pages. Furthermore, one 

be created using the same techniques. For example, an embodiment of thc present mve ntion maintains a set of 

alternative embodiment may interact with a users natural maQy _ to . many mapping 140 0 f tags to pointers in the corpus 

language query received via email or over telephone. 135 Each Ug may correspond to a plurality of pointerSj and 

Overview of a Multi-tiered Natural Language an * P 7"^ mappe A d to t pl ™ lily ° f ta S s ' For 

Query Processing System 35 «ampk , Fia 2A illustrates • ActionTagl that corresponos to 

' & J Hyperlinkl, Hyperhnk2, and Hyperlmk4. Furthermore, FIG. 

To provide an efficient and accurate natural language 2 A illustrates Actio nTag2 that corresponds to Article2, 

processing system, the present invention introduces a Article3, and Article4. This many -to-many mapping allows 

dynamically adjustable multi-tiered natural language pro- the invention to represent the relationship that one question 

cessing system. The multi-tiered natural language process- 40 or concept might be related to multiple hyperlink 

ing system of the present invention selects one or more destinations, and one hyperlink destination might be related 

natural language processing systems from a set of multiple to more than one question. 

different natural language processing systems in order to In order to determine which tags to return, the multi-tiered 

evaluate a particular natural language query. The selection is natural language query processing system 100 of the present 

performed using metalevel reasoning systems that estimate 45 invention uses a plurality of different natural language query 

the expected results of each different natural language resolution methods 150 such as grammar based parse tree 

processing system. 151, Bayesian Network 153, a search engine using annotated 

FIG. 1A illustrates a block diagram of web-based natural terms 155, and an all-term search engine 157. Furthermore, 

language query resolution system that may incorporate the the multi- tiered natural language query processing system 

teachings of the present invention. Referring to FIG. 1A, a 50 100 utilizes a metareasoner 160 in order to determine which 

user interacts with the web site on a web server 120 using a natural language query resolution methods 150 should be 

web browser on a personal computer 101. The personal called and the order in which the natural language query 

computer 101 may communicate with the web server 120 resolution methods 150 should be called, 

across the global Internet 110. However, it should be noted It is often advantageous to call only a subset of the 

that any type of communication link might be used including 55 available natural language query resolution methods 150 due 

an intranet, a direction connection, or a proprietary data to environmental conditions that may exist. For example, 

connection. Typically, the personal computer 101 commu- users may have an expectation for a quick response such that 

nicates with the web server 120 using the HyperText Trans- if the current load of the application server is high then only 

port Protocol (HTTP), although any other protocol may be a subset of natural language query resolution methods 150 

used. 60 should be used to provide a quick response. Thus, the 

Web server 120 may present the user with a web page that metareasoner 160 may track the server load to determine the 

allows a user to enter a natural language query in order to amount of computing resources available to invoke a natural 

search for desired information. When the user enters a language query resolution methods. Many other factors may 

natural language query into the web page displayed on his be considered by the metareasoner 160 to determine which 

personal computer 101, the personal computer 101 may use 65 natural language query resolution methods to invoke, 

an HTTP "POST" command to return the user's natural The metareasoner 160 may also dynamically determine 

language query to the web server 120. the invocation order of the natural language query resolution 
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methods 150. Such a determination may be made once ods 150 to invoke and the order in which they should be 

initially or after each natural language query resolution invoked. The metareasoner 160 may also consider a large 

method invocation. The metareasoner 160 uses the metar- number of other factors including expected time of 

easoning knowledge base 170 to determine determining the computation, resources available, and desired accuracy, 

order of natural language query resolution methods to 5 These factors may be combined with the expected recall and 

invoke. expected precision. In a preferred embodiment, the present 

One type of knowledge in the metareasoning knowledge invention orders the invocation of the natural language 

base 170 is the metareasoning heuristics module 172. Any query resolution methods 150 by their expected precision, 

number of heuristics or other quickly computed strategies Typically, there is a tradeoff between precision and recall in 

could be invoked. For example, one heuristic that the 10 information retrieval. In one embodiment, the present inven- 

metareasoning heuristic module 172 might employ is that hon first attempts the natural language query resolution 

only a short query should be attempted by the grammar. method with the highest expected precision. If this first 

Another metareasoning heuristic that may be employed is natural language query resolution method invocation does 

that queries with multiple independent clauses should not be not return a sufficient number of documents, the metarea- 

processed by the grammar-based method if multiple inde- 15 soner I 60 mav then invoke the next query resolution 

pendent clauses are not modeled in the grammar. Such method. 

queries with multiple independent clauses should be pro- The ordering of query resolution methods may change 

cessed first by the Bayesian network. For example, if the based on the results of the methods already invoked. For 

grammar based method is known to only handle single example, if a search engine method using annotated words 

questions at a time such as "When was your company 20 returned a large number of results with low precision then 

funded?" then a heuristic would be to look for queries with the metareasoner may select to next invoke the Bayesian 

multiple question marks or periods. This type of logic can be network method 151 instead of the all term search engine 

codified in any number of ways, including procedural pro- based methods 155 that will likely produce even less precise 

gramming languages such as Java. results. Furthermore, any number of other measurements can 

In a preferred embodiment, the metareasoning knowledge 25 be used t0 determine the natural language query resolution 

base 170 further comprises an Accuracy Results module method invocation ordering. For example, if the first 

175. The Accuracy Results module 175 computes and stores invoked natural language query resolution method returns 

recall and precision for each natural language query reso- D0 results, and computation time is limited, then the metar- 

lution method. For the task of retrieving relevant FAQ easoner 160 can then immediately invoke a subsequent 

entries, recall is defined as the number of relevant FAQ 30 natural language query resolution method that favors recall 

entries returned by a natural language query resolution over precision with minimal computation time in order to 

method divided by the number of relevant FAQ entries that generate quickly a large number of results, 
should have been returned. Precision is the number of . 

relevant FAQ entries returned, divided by the total number Individual Methods for Processing Natural 

of FAQ entries returned. Recall and precision are standard 35 Language Queries 

metrics used in the field of information retrieval. As previously set forth, many different types of natural 

The Accuracy Results module 175 computes recall and language query resolution methods may be used by the 

precision using data from previous natural language inquir- present invention. A summary of a few of the different 

ies. For a certain subset of previous queries that have been 4Q natural language query processing systems that may be used 

answered by the invention, either an administrative agent or is presented below. However, this list is not exhaustive as 

the author of the query himself can record whether the other natural language query resolution methods may also be 

answers provided were correct. With this information, the used in the system of the present invention. 
Accuracy Results module 175 can acquire and store the 

necessary information in order to assign recall and precision 45 Grammar Based Matching Systems 

values to each natural language query method. Details for Grammar-based matching schemes are commonly used 

computing recall and precision will be obvious to one skilled for speech recognition and interactive voice response appli- 

in the art. cations in order to recognize exactly which words the user 

If actual statistics on recall and precision are not available, said. Typically, the grammar is expressed as a context-free 
possibly because the invention has been recently installed, 50 grammar, since there exist efficient parsers for this class of 
approximations of the recall and precision can be used until grammars, and there do not exist efficient parsers for gram- 
me system has answered enough queries to generate a proper mars that are not context-free. Grammar based parsers 
sample. One skilled in the art will have a sense for the typically exhibit extremely high precision at the expense of 
relative differences in recall and precision between different recall. In other words, when grammar-based parsers return a 
query resolution modes. For example, recall increases while 55 match, that match will usually be correct. However, 
precision decreases when the query resolution method shifts grammar-based parsers are susceptible to failing to find a 
from the grammar 151, to the Bayesian network 153, to the match for queries that seem similar to patterns that have 
search engine with annotated terms 155, to the search engine been entered in the grammar since such parsers can be 
with all terms 157. Whether the recall and precision values intolerant of unanticipated query constructions. For 
are computed from actual statistics or estimated, the recall 60 example, a query such as "How do I return my product" 
and precision values can be used as "expected" recall and might match the grammar, while a query with the same 
precision values in the Accuracy results module 175. In any meaning "How do I go about doing a return on a product that 
situation, the best -known approximations of the recall and I recently received" might not match the grammar, 
precision values should be used in the present invention. 

By computing or retrieving an expected recall and an 65 FeatUre Based Matchin S S y stems 

expected precision as previously set forth, the metareasoner An alternative to the grammar-based matching method is 

160 may select the natural language query resolution meth- to use any number of feature-based pattern matching 
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methods, such as neural networks or Bayesian networks. 
These types of methods are often referred to as uncertainty 
based reasoning systems. To use these feature -based 
methods, the natural language query must be transformed 
into a set of features. Often, these features are based on the 
individual words themselves or at a higher lexical level. 
Lexical analysis of the query may involve elimination of 
stop words or other commonly occurring words. Lexical 
analysis of the query may also involve extracting phrases 
from the query. Phrases can be identified ahead of time by 
analyzing an archive of queries that have been processed by 
the present invention. Any number of methods can be used 
to identify phrases automatically, as one skilled in the art 
will recognize. The features generated after lexical analysis 
such as stop word elimination as phrase identification can be 
used as inputs to a fuzzy pattern-matching algorithm. This 
method is commonly referred to as the "bag of words" 
approach in text classification. 

Search Systems Using Domain-specific Keywords 

A search system based on domain-specific keywords 
combines a search engine with a pre-defined, domain- 
specific lexicon of terms that will be used. The domain- 
specific lexicon can be created automatically or manually. 
The lexicon will have terms that arc typically used in the 
discourse of the domain, particularly those that have infor- 
mational value. For example, in the application domain of 
on-line brokerage, the terms "stock/* "bond," "IRA," and 
"account balance" would be likely terms. In the same 
application domain, most stop words and words such as 
"company" or "blue" would likely not be in the lexicon. The 
use of a domain-specific lexicon with a search engine, 
instead of allowing all words to be used as search terms in 
a search engine will tend to increase precision of returned 
results, while lowering recall. Any number of commercially 
available search engines may be used for this task. 

Search Systems Using All Words 

An search system using all words attempts to located 
sections of text from the corpus that include all the keywords 
in a natural language query after the stop words have been 
removed. Standardized lists of stop words are available on 
the Internet. These lists include words that typically carry 
little information on their own merit. Examples of the words 
include "and," "but", "my", for", "because", and "the". A 
search system may instead include in the search terms all 
words except for stop words and introductory question 
fragments, such as "What are ... " and "How do I ... " Some 
search engines will return a higher score for a document that 
contains a phrase that matches a phrase of two or more 
words in the query list. For example, "How do I get paid to 
surf the web" will invoke high scores from documents that 
contain the phrase "paid surf web", which results after the 
introductory question fragment and stop words are removed 
from the query. An all term keyword search system may use 
a conjunction of search terms or a disjunction. Using a 
conjunction of search terms (a logical "AND" of the terms) 
will lead to lower recall and higher precision, while a 
disjunction of search terms (a logical "OR" of the terms) will 
lead to higher recall but much lower precision. As with the 
search system using a domain -specific lexicon, any number 
of commercially available search engines may be used for 
this task. 

Individual Natural Language Query Processing 
Knowledge Bases 

As set forth in the previous section, the present invention 
may use several different types of natural language query 



20 



processing methods to process natural language queries. 
Many different natural language query-processing methods 
use a knowledge base during the natural language process- 
ing. For implementation efficiency, one embodiment of the 
5 present invention employs a single main knowledge repre- 
sentation to generate a plurality of individual knowledge 
bases for the plurality of natural language query resolution 
methods. FIG. 1 illustrates the single main knowledge 
specification 180. In a preferred embodiment, the present 
10 invention uses an extension of a regular grammar as the 
single main knowledge representation 180. Regular gram- 
mars are a proper subset of context free grammars. 

The grammar in the single main knowledge representation 
180 can be physically formatted in any number of ways. In 
15 one embodiment, the grammar is formatted using the Java™ 
Speech Grammar Format from Sun Microsystems of Moun- 
tain View, Calif. The Java™ Speech Grammar For mat 
(JSGF) is a platform -independent, vendor-independent tex- 
tual representation of grammars for use in speech recogni- 
tion. In an alternative embodiment, the single main knowl- 
edge representation 180 can be formatted using the 
extensible Markup Language commonly referred to as 
XML. 

In a preferred embodiment, the single main knowledge 
representation 180 is compiled into a parse tree 152 to 
generate one of the plurality of knowledge bases from the 
knowledge representation. Specifically, a parse tree 152 is 
used for a grammar based natural language query processing 
method 151. The grammar may be supplemented with 
probabilities in order to generate a Bayesian network from 
the grammar. However, probabilities are not required if the 
grammar is supplemented with annotations of important 
features. 

Consider the grammar example in FIG. 2B that is for- 
matted according to the Java Speech Grammar Format, The 

nonterminal expressions <How_do_I get_paid_to_ 

surf_the_web> and <How_does_eStore_make money> 

are the nonterminals of interest: if a parse of an incoming 
query matches one of these, then we wish to take the action 
as specified in the respective action tags. For example, if a 
user enters "How do I get rewarded for surfing the Internet", 

the grammar parser will match the nonterminal <How_do 

I_get__aid_to__surf_the_web> and return the tag Action - 
Tagl. 

The present invention generates a plurality of knowledge 
bases (KBs) for various natural language query processing 
methods from a single main knowledge specification 180 to 
create a plurality of natural language query resolution meth- 
ods. In a preferred embodiment, the present invention gen- 
erates both a parse tree knowledge base (KB) 152 and 
Bayesian network knowledge base (KB) 154 from the single 
main knowledge specification 180. 

Methods of generating a parse tree knowledge base from 
a regular grammar are well known. On example of a method 
for generating a parse tree knowledge base from regular 
grammar can be found in introductory texts such as Chapter 
4 of Crafting a Compiler, by Charles N. Fischer and Richard 
LeBlanc, 1991. 

60 In a preferred embodiment, the invention also generates a 
Bayesian knowledge base 152 for the Bayesian network 151 
from the single main knowledge specification 180. A Baye- 
sian network knowledge base 152 is a directed acyclic 
graph, where nodes represent random variables, and directed 

65 arcs represent probability distributions of the probability of 
the states of the child node given the state of the parent node. 
The probability distributions may be either continuous or 
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discrete. The absence of an arcs between any two nodes <X>-present)-l-OR(P4, P9). Here, the OR function is the 
denotes an assumption of conditional independence. In a standard noisy-OR combination function, where 
preferred embodiment of the present invention, the distri- 
butions are discrete. Nonetheless, one skilled in the art will 

note that the teachings of the invention can be similarly 5 '"' Pn ^~ ~j 1 

applied to continuous probability distributions. 



Bayesian Network Generation Using Rule 
Expansion 

From the example main knowledge base grammar in FIG. 
2B, the present invention may generate the knowledge 
base for the Bayesian network illustrated in FIG. 3. The 
algorithm to generate the Bayesian network from the 
grammar is straightforward. 

A rule expansion of the form: 

<Y>-fpU<A>{\eak-p$}\fp2f<B>{\tfik-p6}\ 

generates the local network structure in FIG. 4A. Note that 
the leak probabilities need to be specified in order to 
represent a nonzero value for the probability of the events 
that <A> or <B> occur without any modeled influence 
occurring. In other words, for example, <A> might occur in 
a natural language query with a probability of p5 even if 
none of the parents of <A>is present. 
A rule expansion of the form 

<y>-/p3/{lack-p7) <Y><D>/p4/{\eak-p$) <Z>; 

generates the local network structure in FIG. 4B. Combining 
these two examples, a set of rule expansions of the form: 

<Y>=/p3/{leak=/>7) <Y><D>/p4/{leak=pti) <Z>; 
<Y>-/p\J<A>{\e&k-pS}/p2f<B>[leak-p6 } ; 

generates the local network structure in FIG. 4C. The local 
network structure in FIG. 4C is simply the graph union of the 
local network structures in FIGS. 4A and 4B. The condi- 
tional probability distributions in FIG. 4C are exactly the 
same as those in FIGS. 4A and 4B. 

For the rule expansion below, the present invention gen- 
erates the local Bayesian network structure in FIG. 5 A. 

<V>-/p9/{leak r plO) <Z> <C>; 

For the rule expansion below, the present invention gener- 
ates the local Bayesian network structure of FIG. 5B. 

<Y>-/p3/{leak- j p7)<y><Z>>/p4/{leak- j p8)<Z>; 

<Y>=JpV<A>{\e&=p} |//>2/<B>{leak=/>6} ; 

<V>«!p9/{icak-plO)<ZxG>; 

Again, the graph union of the local structures in 4A, 4B, and 
5 A generates the Bayesian network structure of FIG. 5B. 
Where there are multiple parents of a single node arising 
from different rule expansions, the present invention uses a 
combination function to merge probability distributions. For 
example, the parents <V> and <X> appear on the left hand 
side of rule expansions, both of which yield a local structure 
with <Z> as a child node. In FIG. 5B, the conditional 
probability table depicts P(<Z>- present|<V>-<present>, 



Any number of combination functions can be used in the 
present invention. The noisy-OR can be used, or a simple 

10 MAX function can be used, as in the combination of the leak 
probabilities P8 and P10. 

More generally, to generate the Bayesian network corre- 
sponding to n rule expansions in a grammar, the present 
invention assembles the graph union of the local network 
structures corresponding to each of the n rule expansions. 
Where necessary, the present invention merges probability 
distributions using standard combination functions. 

The present invention uses a default leak probability, 

20 which can be set any arbitrary value. This leak probability is 
used whenever a leak is necessary in the conditional prob- 
ability tables of the Bayesian network being generated, but 
there does not explicitly appear a leak probability in the 
JSGF file. In the example in FIG. 2B, a default leak of 

25 p=0,001 for the non-absent states of a node is used. To 
generate the network in FIG. 3 from the grammar in FIG. 
2B, the present invention uses a MAX function for combin- 
ing leak values and a noisy-OR for combining probabilities 
of non-absent states of nodes. These combination functions 

30 were used by the invention to compute the conditional 
probability distribution of <get_paid> given the states of it 
parents. 

Bayesian Network Generation From Annotated 
35 Features 

A simple method of generating a Bayesian network from 
a simple grammar is to assign probabilities to the features 
that are used to build a terminal expression. For example, the 

40 non terminal expression <How_do_I_get paid_to 

surf_Jhe_web> is made up of the following features: 

<IIow_do_I> 

<get_paid> 
45 <to_surf_the_web> 

To build a Bayesian network, those desirable features must 
be annotated. Once a set of annotated features has been 
established, a set of probabilities may be automatically 
assigned. 

50 

Using The Generated Bayesian Network 

To use a Bayesian network that the present invention 
generates, the invention instantiates or sets the values of 
55 nodes in the network corresponding to words that are found 
in the natural language query. For example, "How do I get 
rewarded to surf the Internet?" would set the value of the 
following nodes: 

<paid>is set to "rewarded" 

<surf>is set to "surf 

<web>is set to "internet" 
After these nodes arc set to their values, the invention 
propagates the effects of this evidence through the Bayesian 
65 network using standard Bayesian network inference meth- 
ods. The invention thereby ascertains marginal probabilities 
on the nodes of interest. 
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Id the example in FIG. 3, the probabilities of interest are: resources for additional processing, or if all the different 

(<How__do„I_get_paid_to„_surf__the « processing methods have been invoked, then the multi- tiered 

present|query) natural language query processing system returns the current 

(<How_does„eStore^et_paid> BS present|query) results at ste P 690 resources examined may be time or 

In a natural language query processing method consisting 5 com P uta ^°al resources, 

of the search engine using a domain-specific keywords, the Referring back to step 650, if the cumulative results are 

present invention uses a search engine in order to match insufficient and there are enough resources for additional 

selected terms from the natural language query to an indexed processing, then the multi-tiered natural language query 

form of the corpus 135. To increase the precision of the processing system returns to step 630 to selection another 

search, the invention limits the search terms to be the 30 natural language query processing method to invoked. At 

intersection of terms in the natural language query with ste P *>30, the metareasoner 160 selects another natural 

terms corresponding to child nodes in the Bayesian network language query processing method using a number of dif- 

knowledge base. For example, the terms corresponding to ferent data sources that may include the results from the 

child nodes in the Bayesian network in FIG. 3 are: paid, metareasoning knowledge base 170, the additional factors 

reimbursed, rewarded, cash, money, dollars, browse, view, 35 fi 35, and the results 645 from previous invocations. The 

surf, look at, world wide web, web, internet, and www. system proceeds through steps 630, 640, and 650 until 

desired results are achieved, insufficient resources are 

Metareasoning Processing available, or all the different natural language query reso- 
lution methods have been invoked. After completion, the 

The present invention can employ either a static or 20 results are returned at step 690. 
dynamic ordering for invoking query resolution methods. In 

a static ordering, the invocation of the various natural Example Of Metareasoner Workflow 

language query resolution methods are ordered by decreas- , . , 

ing levels of expected precision. For the query resolution ^ exam P le of how the P resent invention handles various 

methods of a preferred embodiment, this ordering would be: 2 5 Sample natUral hn ^ e 4 ueries ^ aid ™ demonstrating 

the parse tree based grammar method 151, the Bayesian th f ^hing* of tl ? e invention. Consider the following natu- 

network method 153, the annotated term search engine ral lan g ua g e queries: 

method 155, and the all term search engine method 157. Query 1. "How do I get paid to surf the web?" 

In a dynamic ordering, the metareasoner 160 revises the Query 2. "Get money to surf the web?" 

ordering based on the results of each successive invocation 30 Query 3. "I need to make some money. Your site looks 

and available computational resources. For example, if the very interesting to me. Flow do I get paid if I want to 

Bayesian network method 151 returns an empty set of tags surf the Internet?" 

and computational time is limited then the metareasoner 160 Assume an embodiment of the present invention in which 

can skip the annotated terms search engine query resolution the metareasoner employs a dynamic ordering of query 

method 155 and invoke the all term keyword search engine 35 resolution methods as illustrated in FIG. 6. Furthermore, 

method 157. Calling the all term keyword search engine assume an embodiment in which there are four query 

method 157 instead of the annotated terms search engine resolution methods, as illustrated in FIG. 1: a parse tree 

query resolution method 155 sacrifices precision in favor of based grammar method 151, a Bayesian network method 

higher recall. 153, an annotated term search engine method 155, and an all 

40 term search engine method 157. 

One Metareasoner Embodiment At the time the query arrives at the application server 130, 

rrr **i . . a . . L the metareasoner 160 consults the metareasoning knowledge 

FIG. 6 illustrates a flow diagram that describes the opera- base m ^ information from the metareasoning knowl- 

tion of one embodiment of a dynamic ordering multi-tiered e(J base m ^ metareasoner 160 determines that there 

natural language query processing system. Referring to FIG. 45 J resources available (0 evahlate the 

6, the multi-tiered natural language query processing system u j *, 1 1 • 

* . , . . & 6 M * , f 1A ; using the grammar based natural language query processing 

firs receives a natural language query at step 610. The ^ & and ^ ^ is f f lJ* evaluat f 

multi-tiered natural anguage query process.ng system then ^ ^ ^ ^Udge bas f 15 £ If the mventioil 

processes the nataral language query using the metareason- contains a £ d ^ m no 2B , hen Q 

mg knowledge base 170 to generate expected recall and 5Q ^ ^ ActionT x will be 

precision values at step 620. IT , r . . .... „ 6 „ ... 4 

r v Under the same initial conditions, Query 2 will not match 

At step 630, the metareasoner 160 of the multi-tiered the parse tree generated by the grammar in FIG. 2B. Assum- 

natural language query processing system selects a natural ing tha t computational resources are not constrained, the 

language query processing method using the results from the metareasoner 160 invokes the remaining query resolution 

knowledge base 170. The metareasoner 160 may optionally 55 met h 0 d with the highest expected precision, the Bayesian 

take into consideration additional factors 635 such as the network 153. The Bayesian network depicted in FIG. 3 will 

current computational load. The multi-tiered natural lan- return ActionTagl, when presented with the query features 

guage query processing system invokes the selected lan- "paid", "surf, and "web". 

guage query processing method to process the natural lan- Consider Query 3 in a situation in which computational 

guage query at step 640. The results of the invocation 645 60 resources are constrained. For example, multiple queries can 

may be kept for later use. b e q Ueuec i ^ the application server 130, awaiting evaluation. 

At step 650, the metareasoner 160 of the multi-tiered The metareasoner 160 uses a heuristic from the heuristics 

natural language query processing system determines if the module 172 that indicates that if the query is long and 

current cumulative results are acceptable, if there are insuf- contains a period then the grammar based natural language 

ficient resources for further evaluation, or if all the different 65 query resolution method 151 should be skipped. This 

processing methods have been invoked. If the current cumu- heuristic, combined with the existing resource constraints, 

lative results are acceptable, there are not sufficient dictate to the metareasoner to invoke the annotated term 
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search engine 155. Any number of methods can be used to 
codify this logic, such as a rule engine. Methods for sym- 
bolic reasoning using a rule engine are well known, and can 
be found in a variety of introductory books on artificial 
intelligence, such as "Rule-based Expert Systems" by Bruce 5 
Buchanan and Edward Shortliffe. The annotated terms from 
Query 3 are "money", "paid," "surf*, and "internet." These 
terms are used as input to the search engine 155. The search 
engine returns a number of tags, including ActionTagl. 

10 

Refinement of the Knowledge Specification Using 
Machine Learning 

The present invention includes methods for refining auto- 
matically the single main knowledge specification 180 as the 
invention handles more natural language queries. In a pre- 
ferred embodiment, the present invention modifies the prob- 
abilities in the single main knowledge specification 180 
using techniques for learning Bayesian networks. In one 
embodiment, both the single main knowledge specification 
180 and the Bayesian network knowledge based 154 are 20 
modified simultaneously. 

For example, if the present invention continually receives 
the query "How do I get reimbursed to surf the web?". The 
following probabilities will increase: 25 
P(<paid>="reimbursed"|<get paidl>="get <paid>") 
P(<get paidl>="get <paid>"|<get_paid>=present) 
P(<get_paid>=present|<How„_do_I_get_paid_to_ 

surf_the_web>=present). 
Any number of methods can be used to refine probabili- 30 
ties in the single main knowledge specification 180 and the 
corresponding knowledge base for the Bayesian network 
method 153. For example, D. Heckerman, D. Geiger and D. 
M. Chickering, "Learning Bayesian Networks: The Combi- 
nation of Knowledge and Statistical Data." Proc. 10th Conf. 35 
Uncertainty in Artificial Intelligence, Morgan Kaufmann 
Publishers, San Francisco, Calif., 1994, pp. 293-301. 

Use of Partial Results 

40 

One embodiment of the present invention uses results 
computed in one natural language query resolution method 
as inputs into subsequently invoked natural language query 
resolution methods. Specifically, referring to FIG. 1, the 
metareasoner 160 may invoke a first natural language query 45 
resolution method and then store the results from that first 
natural language query resolution method in partial results 
177. Then, the metareasoner 160 may subsequently invoke 
additional natural language query resolution methods and 
provide the partial results 177 as input parameters to help 5Q 
those subsequent natural language query resolution meth- 
ods. 

Under certain conditions, the parse tree based grammar 
query resolution method 151 is invoked by the metareasoner 
160. If the parse tree based grammar query resolution 55 
method 151 fails to return a parse of the complete natural 
language query, it can return one or more parses of part of 
the natural language query. For example, the expansion of 
the <to_surf_the_web>appearing in FIG. 3 can be 
changed to: 60 

<to_sur£_thc_web>= to /0.9/<surf><article>/0.8/ 
<wcb>{Partiarn»gl}; 

If the incoming query is "Can you tell me how I can get 
rewarded to surf the Internet?" the modified grammar in 65 
FIG. 2B would return PartialTagl. An embodiment of the 
present invention maintains a list of the associations 



between partial tags and nodes in the Bayesian network. In 
this example, PartialTagl corresponds to the node in FIG. 3 
labeled "<to^surf_the_web>. Thus, the invention can set 
the value of this node to present. The invention also recog- 
nizes which parts of the natural language query correspond 
to the partial result. 

In this example, the parse tree based grammar query 
resolution method 151 returns PartialTagl and the portion of 
the natural language query that matched the partial result, 
namely "get rewarded to surf the Internet". The metarea- 
soner 160 stores this information in the Partial results 
knowledge base 177. When the metareasoner 160 calls the 
Bayesian network method 153, the metareasoner 160 passes 
the string "Can you tell me how I can" and the partial result 
PartialTagl to tie network. 

Use of partial results in this fashion increases the preci- 
sion of the method using the partial result. For example, 
setting the node "<to_surf__the_web>" to present will 
increase the probability of <How_do_J__geL_paid__to__ 
surf_the_web>more than setting the nodes "surf and 
"web" to present. 

An embodiment of the present invention passes partial 
results obtained from the parse tree based grammar query 
resolution method 151 to the search engine based methods 
155 and 157. The phrases associated with the partial results 
tag that are stored in the partial results knowledge base 177 
are sent to the search engines 155 and 157 along with the 
portion of the natural language query that was not associated 
with partial result. In the previous example, if the metarea- 
soner 160 invokes the annotated term search engine 155 
after invoking the parse tree, the metareasoner 160 sends the 
string Can you tell me how I can" and the string "get 
rewarded to surf the internet" to the parse engine. Note that 
the second string is intended to be sent to the search engine 
as a quoted string. In an embodiment of the present 
invention, the search engine will use the quoted string to 
search for an exact match of the quoted string, thereby 
increasing the precision of the search. 

The foregoing disclosure has described a multi-tiered 
natural language query resolution system. It is contemplated 
that changes and modifications may be made by one of 
ordinary skill in the art, to the materials and arrangements of 
elements of the present invention without departing from the 
scope of the invention. 

We claim: 

1 . A computer implemented method of processing natural 
language queries, said method comprising: 

invoking a metareasoning module to obtain a set of 
expected result metrics for a plurality of different 
natural language query resolution methods, each of said 
different natural language query resolution methods 
capable of returning a final natural language query 
result; 

selecting a natural language query resolution method from 
said plurality of different natural language query reso- 
lution methods dependent on said set of expected result 
metrics for said plurality of different natural language 
query resolution methods; and 

invoking said selected natural language query resolution 
method to obtain actual natural language query results. 

2. The method of processing natural language queries as 
claimed in claim 1 further comprising: 

returning at least one tag or pointer to a section of an 
information corpus from said actual natural language 
query results. 

3. The method of processing natural language queries as 
claimed in claim 1 wherein selecting a natural language 
query resolution method is further dependent upon addi- 
tional factors. 
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4. The method of processing natural language queries as 
claimed in claim 3 wherein one of said additional factors 
comprises a current load. 

5. The method of processing natural language queries as 
claimed in claim 1 wherein said method repeats selecting 
said natural language query resolution method and invoking 
said selected natural language query resolution method until 
a set of desired natural language query results are fulfilled by 
said actual natural language query results. 

6. The method of processing natural language queries as 
claimed in claim 1, said method further comprising: 

analyzing said actual natural language query results; 

selecting a next natural language query resolution method 
dependent on said actual natural language query results 
from at least one previous natural language query 
resolution method. 

7. The method of processing natural language queries as 
claimed in 6 wherein said method repeats selecting said 
natural language query resolution method dependent on said 
set of expected result metrics and invoking said selected 
natural language query resolution method until a set of 
desired natural language query results are fulfilled by said 
actual natural language query results. 

8. The method of processing natural language queries as 
claimed in claim 6 wherein selecting said next natural 
language query resolution method is further dependent upon 
additional factors. 

9. The method of processing natural language queries as 
claimed in claim 8 wherein one of said additional factors 
comprises a current load. 

10. The method of processing natural language queries as 
claimed in claim 1, said method further comprising: 

selecting a next natural language query resolution method 
from said plurality of different natural language query 
resolution methods; and 

invoking said next natural language query resolution 
method wherein said next natural language query reso- 
lution method is provided with a set of partial results 
from said previous selected natural language method. 

11. The method of processing natural language queries as 
claimed in claim 1 wherein said method further comprises: 

adjusting a Bayesian network knowledge base used by a 
natural language query resolution method after a natu- 
ral language query has been processed. 

12. A computer implemented method of processing a 
natural language queries, said method comprising: 

invoking a first natural language query resolution method 
from a plurality of different natural language query 
resolution methods to obtain partial results, each of said 
different natural language query resolution methods in 
said plurality of different natural language query reso- 
lution methods capable of returning a final natural 
language query result; and 

invoking a next natural language query resolution method 
from said plurality of different natural language query 
resolution methods using said partial results from said 
first natural language query resolution method. 

13. The method of processing natural language queries as 
claimed in claim 12 further comprising: 

invoking a metareasoning module to obtain a set of 
expected result metrics for a plurality of natural lan- 
guage query resolution methods; and 

selecting said first natural language query resolution 
method from a plurality of natural language query 
resolution methods dependent on said set of expected 
result metrics. 



14. The method as claimed in claim 13 wherein selecting 
said natural language query resolution method is further 
dependent upon additional factors. 

15. The method a s claimed in claim 12 further compris- 
ing: 

analyzing said partial results; 

selecting said next natural language query resolution 
method dependent on said partial results from said first 
natural language query resolution method. 

16. A computer implemented natural language processing 
system for processing a natural language query, said natural 
language processing system comprising: 

a single main knowledge representation; 
a first natural language query resolution method for 
analyzing said natural language query, said first natural 
language query resolution method capable of returning 
a first final natural language query result; 
a first knowledge base for said first natural language query 
resolution method, said first knowledge base derived 
from said single main knowledge representation; 
a second natural language query resolution method for 
analyzing said natural language query, said second 
natural language query resolution method using a sec- 
ond knowledge base; 
a second knowledge base for said second natural language 
query resolution method, said second knowledge base 
derived from said single main knowledge 
representation, said second natural language query 
resolution method capable of returning a second final 
natural language query result; and 
a metareasoner for invoking said first or second natural 
language query resolution methods based upon 
expected result metrics. 

17. The natural language processing system as claimed in 
40 claim 16 wherein said single main knowledge base com- 
prises a set of patterns and an expansion of said set of 
patterns. 

18. The natural language processing system as claimed in 
claim 17 further comprising: 

a derivation system, said derivation system generating an 
uncertainty based reasoning system based upon said set 
of patterns and said expansion of said set of patterns. 

19. The natural language processing system as claimed in 
claim 16 wherein said single main knowledge base com- 
prises a grammar. 

20. The natural language processing system as claimed in 
claim 19 wherein said first natural language query resolution 
method comprises a grammar based method and said first 
knowledge base comprises a parse tree derived from said 
regular grammar. 

21. The natural language processing system as claimed in 
claim 19 wherein said first natural language query resolution 
method comprises a Bayesian network based method and 
said first knowledge base comprises a Bayesian network 
derived from said regular grammar. 

22. The system as claimed in claim 16 further comprising: 
a refinement system, said refinement system improving 

said single main knowledge representation based upon 
results from a final result from a natural language 
query. 
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23. A computer implemented method of processing natu- 
ral language queries, said method comprising: 
consulting a metareasoning knowledge base to compare 
expected metrics for a plurality of natural language 
query resolution methods, each of said different natural 
language query resolution methods in said plurality of 
different natural language query resolution methods 
capable of returning a final natural language query 
result; 

selecting a natural language query resolution method from 
said plurality of natural language query resolution 
methods dependent on information from said metarea- 
soning knowledge base; and 
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invoking said selected natural language query resolution 
method to obtain actual results. 

24. The method of processing natural language queries as 
claimed in claim 22 wherein said expected metrics comprise 
expected recall value. 

25. The method of processing natural language queries as 
claimed in claim 23 wherein said expected metrics comprise 
expected precision value. 

26. The method of processing natural language queries as 
claimed in claim 25 wherein said method successively 
invokes natural language query resolution methods in an 
order of decreasing precision values. 
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